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Abstract — For the first time, this paper systematically 
identifies three categories of throughput oriented work- 
loads in data centers: services, data processing appUca- 
tions, and interactive real-time appUcations, whose targets 
are to increase the volume of throughput in terms of 
processed requests or data, or supported maximum number 
of simultaneous subscribers, respectively, and we coins a 
new term high volume throughput computing (in short 
HVC) to describe those workloads and data center systems 
designed for them. We characterize and compare HVC 
with other computing paradigms, e.g., high throughput 
computing, warehouse-scale computing, and cloud comput- 
ing, in terms of levels, workloads, metrics, coupUng degree, 
data scales, and number of jobs or service instances. 
We also preliminarily report our ongoing work on the 
metrics and benchmarks for HVC systems, which is the 
foundation of designing innovative data center systems for 
HVC workloads. 

Index Terms — High volume throughput computing; 
Throughput-oriented workloads; Data center systems; 
Metrics; Benchmarks; 

I. Introduction 

In the past decade, there are three trends in 
computing domains. First, more and more services, 
involving a large amount of data, are deployed in 
data centers to serve the masses, e.g., Google search 
engine and Google Map. Second, massive data are 
produced, stored, and analyzed in real time or off 
line. According to the annual survey of the global 
digital output by IDC, the total amount of global 
data passes 1.2 zettabytes in 2010. In this paper, 
we call applications that produce, store, and analyze 
massive data data processing applications, which is 



also referred to as big data applications. Third, lots 
of users tend to use streaming media or VoIP for 
fun or communications. Different from an ordinary 
Web server, a VoIP application will maintain a 
user session of a long period (e.g., more than five 
minutes) while guaranteeing the real time quality 
of service, which we call an interactive real-time 
application. 

The workloads mentioned above consist of not a 
big job but a large amount of loosely coupled ones. 
The nature of this class of workloads is throughput- 
oriented, and the target of data center systems de- 
signs for them is to increase the volume of through- 
put in terms of processed requests (for services), or 
processed data (for data processing applications), or 
the maximum number of simultaneous subscribers 
(for interactive real-time applications), performed or 
supported in data centers. So as to pay attention 
to this class of workloads and systems designed 
for them, in this paper, we coin a new term high 
volume throughput computing (nature: throughput 
computing; target: high volume, in short HVC) 
to describe this class of workloads or data center 
systems designed for them. 

In this paper, we identify three categories of 
workloads in HVC: services, data processing ap- 
plications, and interactive real-time applications, 
all of which are throughput-oriented workloads. A 
service is a group of applications that collaborate 
to receive user requests and return responses to 
end users. More and more emerging services are 
data-intensive, e.g., Google search engine or Google 



Map. Data processing applications produce, store, 
and analyze massive data, and we only focus on 
loosely coupled data processing applications, ex- 
cluding tightly coupled data-intensive computing, 
e.g., those written in MPI. Typical examples are 
MapReduce or Dryad based computing. We also 
include data stream applications that process con- 
tinuous unbounded streams of data in real time into 
the second category of HVC workloads. Different 
from an ordinary Web server, an interactive real- 
time application will maintain a user session of a 
long period while guaranteeing the real time quality 
of service. Typical interactive real-time applications 
include streaming media, desktop clouds [33], and 
Voice over IP (VoIP) applications. The details of 
three categories of workloads can be found at II-A. 

Despite several computing paradigms are not 
formally or clearly defined, e.g., warehouse- 
scale computing (WSC) [4], data-intensive scalable 
computing (DISC) [28], and cloud computing, we 
compare HVC with several computing paradigms in 
terms of six dimensions: levels, workloads, metrics, 
coupling degree, data scales, and number of jobs or 
service instances as shown in Table I. Our definition 
of HVC is towards data center systems, while many 
task computing [14] and high throughput comput- 
ing [26] are defined towards runtime systems. In 
terms of workloads and respective metrics, both 
high throughput computing and high performance 
computing are about scientific computing centering 
around floating point operations, while most of 
HVC applications have few floating point operations 
as uncovered in our preliminary work [3]. Mean- 
while, we also notice that many emerging workloads 
can be included into one or two categories of HVC 
workloads, e.g., WSC (into the first and second cate- 
gories), and DISC (into the second category). As for 
(public) cloud computing, we believe it is basically 
a business model of renting computing or storage 
resources, which heavily relies upon virtualization 
technologies, while our HVC is defined in terms of 
workloads. We think many well-know workloads in 
cloud [5] can be included into HVC, but HPC in 
cloud workloads [1] [38] are excluded, since they 
are tightly-coupled. 

After widely investigating previous benchmarks, 
we found there is no systematic work on 
benchmarking our identified three categories of 



throughput-oriented workloads. We present our pre- 
liminary work on the metrics and benchmarks for 
HVC systems. 

The remainder of the paper is organized as fol- 
lows. Section II characterize and compare different 
computing paradigms. Section EI revisits previous 
benchmarks and report our preliminary work on the 
HVC metrics and benchmarks. Section IV draws a 
conclusion. 

II. Characterizing computing paradigms 

In this section, we give out the definition of HVC, 
and identify its distinguished differences from other 
computing paradigms. 

A. What is HVC? 

HVC is a data center based computing paradigm 
focusing on throughput-oriented workloads. The 
target of a data center system designed for HVC 
workloads is to increase the volume of throughput 
in terms of requests, or processed data, or the max- 
imum number of simultaneous subscribers, which 
are performed or supported in a data center. 

In Table I, we characterize HVC from six dimen- 
sions: levels, workloads, metrics, coupling degree, 
data scales, and number of jobs or service instances. 

The HVC system is defined on a data center 
level. We identify three categories of workloads in 
HVC: services, data processing applications, and 
interactive real-time applications. Services belong 
to the first category of HVC workloads. A service 
is a group of applications that collaborate to receive 
user requests and return responses to end users. 
We call a group of applications that independently 
process requests a service instance. For a large 
Internet service, a large amount of service instances 
are deployed with requests distribution enabled by 
load balancers. Since each request is independent, 
a service in itself is loose coupled. For an ordinary 
Apache Web server, the data scale is lower, while 
for a search engine provided by Google, the data 
scale is large. More and more emerging services 
are data-intensive. 

The second category of HVC workloads is data 
processing applications. Please note that we only 
include loosely coupled data-intensive computing, 
e.g., MapReduce jobs, into HVC, excluding data- 
intensive MPI applications. In running, MapReduce 



TABLE I: Characterizing different computing paradigms. 



Computing paradigm 


level 


Workloads 


Metrics 


Coupling 
degree 


Data 
scale 


jj jobs or ser- 
vice instances 


High performance 
computing 


Super computers 


Scientific comput- 
ing: heroic MPI ap- 
plications 


Float point opera- 
tions per second 


Tight 


n/a 


Low 


High performance 
throughput computing 
[27] 


Processors 


Traditional server 
workloads 


Overall work per- 
formed over a fixed 
time period 


loose 


n/a 


Low 


High throughput com- 
puting [26] 


Distributed runtime 

systems 


Scientific comput- 
ing 


Float point opera- 
tions per month 


loose 


n/a 


Medium 


Many task computing 
[14] 


Runtime systems 


Scientific comput- 
ing or data analy- 
sis: workflow jobs 


Tasks per second 


Tight 
or 

loose 


n/a 


Large 


Data-intensive scalable 
computing [28 ] or data 

center computing [29 [ 


Runtime systems 


Data analysis: 
MapReduce-like 

jobs 


n/a 


Loose 


Large 


Large 


Warehouse-scale com- 
puting [4] 


Data centers for Inter- 
net services, belonging 
to a single organization 


Very large Internet 
services 


n/a 


Loose 


large 


Large 


Cloud computing ]34] 
[15] 


Hosted data centers 


SaaS + utility com- 
puting 


n/a 


Loose 


n/a 


Large 


High volume through- 
put computing (HVC) 


Data centers 


Services 

Data processing ap- 
plications 

Interactive real- 
time applications 


Requests per min- 
utes and joule 

Data processed per 
minute and joule 

Maximum number 
of simultaneous 
subscribers and 
subscribers per 
watt 


Loose 
Loose 
Loose 


Medium 

Large 

From 
medium 
to large 


Large 
Large 
Large 



tasks are independent, significantly different from 
batch jobs of programming models like MPI in 
which tasks execute concurrently and communicate 
during their execution [32]. The data scale of this 
category of workloads is large, which are also 
referred to as big data applications, and hence will 
produces large amount of tasks. We also include 
data stream applications into this category of HVC 
workloads. For example, S4 is a platform that allows 
programmers to easily develop applications for pro- 
cessing continuous unbounded streams of data [37]. 

The third category of HVC applications is in- 
teractive real-time applications. Different from an 
ordinary Web server, an interactive real-time appli- 
cation will maintain a user session of a long period 
while guaranteeing the real time quality of service. 
Typical interactive real-time applications include 
streaming media — multimedia that is constantly de- 
livered to an end-user by a service provider, desktop 



clouds [33], and Voice over IP (VoIP) applications. 
For this category of applications, the workload is 
loosely coupled because of independent requests 
or desktop applications; the data scale varies from 
medium to large, and the number of tasks or service 
instances is large. 

B. Identifying differences of HVC 

1 ) High performance computing: There are two- 
fold differences of HVC from high performance 
computing (in short, HPC): first, workloads are 
different. HPC is mainly about scientific computing 
and usually a large-scale heroic MPI application, 
which is tightly coupled, while HVC is loosely 
coupled and commonly composed of a large amount 
of jobs or service instances. Second, the metric 
is different. The metric of HPC is floating point 
operation per second. However, in HVC, most of 
workloads, e.g., web search engines reported in [3], 



have few float point operations. 

2) High throughput computing: Livny et al. re- 
fer to the environments that can be deliver large 
amounts of processing capacity over very long pe- 
riods of time as high throughput computing [26]. 
There are three-fold differences of of HVC: first, 
high throughput computing is defined on the level of 
distributed runtime systems, while HVC is defined 
on the level of data center systems; second, the 
workloads of high throughput computing are to- 
wards scientific computing, while our HVC includes 
three categories of applications; third, the metric of 
high throughput computing is floating point opera- 
tions per month or year. However, in HVC, major 
workloads, e.g., web search engines [3], have few 
float point operations. 

3) Many task computing: According to [14], 
many task computing differs from high throughput 
computing in the emphasis of using large number 
of computing resources over short periods of time 
to accomplish many computational tasks where pri- 
mary metrics are measured in seconds (e.g. FLOPS, 
tasks/sec, MB/s I/O rates), as opposed to opera- 
tions per month. In terms of workloads, many task 
computing denotes high-performance computations 
comprising multiple distinct activities, coupled via 
file system operations [14]. The set of tasks may be 
loosely coupled or tightly coupled. 

The differences of HVC are as follows: first, 
the concept of many task computing is towards 
runtime systems, while our HVC is on the level of 
data center systems; second, the workloads and re- 
spective metrics are different. Many task computing 
includes tightly or loosely coupled applications from 
scientific computing or data analysis domains, while 
HVC includes three categories of loosely coupled 
applications. 

4) High performance throughput computing: 
Chaudhry et al. [27] call the combination of 
high-end single-thread performance and a high de- 
gree of multicore, multithreading high performance 
throughput computing. According to [27], systems 
designed for throughput computing emphasize the 
the aggregate amount of computation performed by 
all functional units, threads, cores, chips, coproces- 
sors and network interface cards in a system over a 
fixed time period as opposed to focusing on a speed 
metric describing how fast a single core or a thread 



executes a benchmark. 

Different from HVC, high performance through- 
put computing is defined on the level of processors, 
targeting traditional server workloads, e.g., database 
workloads, TPC-C, SPECint2000, SPECfp2000 ex- 
emplified in [27], while HVC is defined on the 
data center level. Different from high performance 
throughput computing, we have identifies three cate- 
gories of workloads in HVC. In terms of workloads, 
high performance throughput computing mainly fo- 
cuses on the first category of HVC applications 
(excluding data-intensive services like search en- 
gines) in addition to float point operation-intensive 
benchmarks like SPECfp2000. Moreover, we define 
different metrics for three different categories of 
HVC applications in III-B. 

5) Data-intensive scalable computing (in short, 
DISC) or data center computing: Bryant et al. did 
not formally define what is DISC [28]. Instead, they 
characterized DISC through comparing it with cloud 
computing and high performance computing in [28]: 
first, cloud computing is towards hosted services, 
e.g., web-based Email services, while DISC refers 
to very large, shared data repository enabling com- 
plex analysis [28]; second, from a perspective of 
programming model, a HPC program is described 
at very low level with specifying detailed control 
of processing communications, while DISC appli- 
cations are written in terms of high-level operations 
on data, and the runtime system controls scheduling 
and load balancing [28]. In general, a DISC appli- 
cation is written in the MapReduce programming 
model, which splits jobs into small tasks that are 
run on the clusters compute nodes. No one formally 
defines what is data center computing. In [29], 
it refers to computing performed with frameworks 
such as MapReduce, Hadoop, and Dryad. Through 
these frameworks, computation can be performed 
on large datasets in a fault-tolerant way, while 
hiding the complexities of the distributed nature 
of the cluster [29]. According to [29], data center 
computing shares the same view like that of DISC. 

Basically, DISC or data center computing can be 
included into the second category of HVC work- 
loads. In addition, our HVC is defined toward data 
center systems instead of MapReduce-like runtime 
systems mentioned in [28] [29]. Second, Bryant 
et al. do not formally define the metrics, while 



we define three different metrics in evaluating data 
center systems for three categories of appUcations 
in III-B. 

6) Warehouse-scale computing: According to 
[4], the trend toward server-side computing and the 
exploding popularity of Internet services has created 
a new class of computing systems that Hoelzle et al. 
have named warehouse-scale computers, or WSC. 
WSC is meant to call attention to the most dis- 
tinguishing feature of these machines: the massive 
scale of their software infrastructure, data reposito- 
ries, and hardware platform [4]. WSC demonstrates 
the following characteristics [4]: belonging to a 
single organization, using a relatively homogeneous 
hardware and system software platform, and sharing 
a common systems management layer [4]. Most 
importantly, WSCs run a smaller number of very 
large Internet services [4]. 

There are two differences of HVC from WSC: 
first, WSC is towards data centers for Internet 
services, belonging to a single organization, while 
in HVC, many small-to-medium scale services from 
different organization may be hosted in the same 
data center, the aggregate number of which is 
large. Second, in terms of workloads, WSC can 
be included in HVC, meanwhile HVC covers more 
categories of workloads, e.g., interactive real-time 
applications. Moreover, we have given out the spe- 
cific metrics for different categories of workloads, 
while in WSC, no metric is defined in [4]. 

7) Cloud computing: Vaquero et al. [34] de- 
fined cloud as a large pool of easily usable and 
accessible virtualized resources (such as hardware, 
development platforms and/or services). According 
to the widely cited Berkeley's technology report 
[15], when a cloud is made available in a pay-as- 
you-go manner to the general public, it is called 
a public cloud; the service being sold is utility 
computing [15]. The term private cloud is used to 
refer to internal data centers of a business or other 
organization, not made available to the general pub- 
lic [15], which is also characterized as warehouse- 
scale computing in [4] or data center computing in 
[29]. 

Basically, we believe that (public) cloud comput- 
ing is a fine grain pay-per-use business model of 
renting computing or storage resources, which heav- 
ily relies upon virtualization technologies, while our 



HVC is defined in terms of workloads. Though 
virtualization technologies indeed bring changes to 
workloads, which is worth further investigation, 
however, well-known workloads in cloud studied 
in [5], e.g., NoSQL data serving, MapReduce, Web 
search, can be included into HVC. A new trend is 
that practitioners in HPC communities advocate an 
innovative computing paradigm — HPC in cloud [1] 
[38], which suggests renting virtual machines from a 
public cloud for running parallel applications. Since 
HPC in cloud workloads are tightly coupled, we 
exclude them from HVC. Moreover, in the context 
of a business model, no one formally defines the 
metrics for evaluating cloud from perspectives of 
both hardware and software systems. 

C. Discussion 

Despite several computing paradigms are not 
formally or clearly defined, e.g., DISC, WSC, and 
cloud computing, we characterize several new com- 
puting paradigms and compare the definition of our 
HVC from them. 

We draw three conclusions in this Section: first, 
the definition of HVC is towards data center sys- 
tems, while many task computing, high throughput 
computing, or DISC are defined towards runtime 
systems. Second, in terms of workloads and re- 
spective metrics, both high throughput computing 
and high performance computing are about sci- 
entific computing centering around floating point 
operations, while most of HVC applications have 
few floating point operations. Third, many emerg- 
ing workloads can be included into one or two 
categories of HVC workloads e.g., WSC (into the 
first and second categories), DISC or data center 
computing (into the second category); With the 
exception of HPC in cloud workloads[l], well-know 
workloads in cloud studied in [5] can be included 
into HVC. Moreover, (public) cloud computing is 
basically a business model of renting computing 
or storage resources, while our HVC is defined in 
terms of workloads. 

III. Benchmarking HVC systems 

In this section, we revisit previous benchmarks, 
and present our current work on HVC metrics and 
benchmarks. 



A. Revisiting previous benchmarks and metrics 

Table II summarizes different benchmarks and 
their respective levels, workloads, and metrics. 

The LINPACK benchmark report [7] describes 
the performance for solving a general dense ma- 
trix problem Ax = b. Performance is often mea- 
sured in terms of floating point operations per 
second (flop/s). Focusing on evaluating the ability of 
traversing the whole memory of the machine [20], 
Murphy et al. have put together a benchmark they 
call Graph 500, and the metric is the traversed edges 
per second (TEPS). Since energy efficiency be- 
comes more and more important, the GreenSOO List 
ranks supercomputers used primarily for scientific 
production codes according to the amount of power 
needed to complete a fixed amount of work [17]. 
Sun Microsystems also proposed the SWaP (space, 
watts, and performance) metric [24] to evaluate 
enterprise systems from perspectives of both data 
center space efficiency and power consumption. 

Rivoire et al. [18] proposed the JouleSort bench- 
mark. The external sort from the sort bench- 
mark specification (http://research.microsoft.com/ 
research/barc/SortBenchmark/default.htm) is chosen 
for the benchmarks workload. The metrics is records 
sorted per Joule. SPECpower_ssj2008 is the first 
industry-standard SPEC benchmark that evaluates 
the power and performance characteristics of vol- 
ume server class and multi-node class computers 
[36]. The initial benchmark addresses only the per- 
formance of server side Java — SPECjbb2005. The 
metric is performance per watt metric in terms of 
ssj_ops/watt. 

TPC defined transaction processing and database 
benchmarks, the goal of which is to define a set of 
functional requirements that can be run on any trans- 
action processing system, regardless of hardware or 
operating system [35]. Most of the TPC benchmarks 
are obsolete, and only three are still used: TPC- 
C, TPC-E, and TPC-H. TPC-C centers around the 
principal activities (transactions) of an order-entry 
environment [35]. TPC-E models a brokerage firm 
with customers who generate transactions related 
to trades, account inquiries, and market research 
[35]. Different from TPC-C and TPC-E, TPC-H 
models the analysis end of the business environment 
where trends are computed and refined data are 



produced to support the making of sound business 
decisions [35]. For the TPC benchmarks, the metrics 
are application-specific. For example, the metric of 
TPC-C is the number of New-Order transactions 
executed per minute. 

In the context of data center computing, HiBench 
[13], GridMix2 or GridMix 3 [39], WL Suite [9] 
is proposed to evaluate MapReduce runtime sys- 
tems, respectively. The workloads are data analysis 
applications. The metrics are throughput in terms 
of the number of tasks per minute [13], and job 
running time, widely used in batch queuing systems 
[16]. YCSB [19] and an extension benchmark — 
-YCSB-I-+ [10] are proposed to evaluate NoSQL 
systems for scale-out data services. The metrics are 
throughput — total operations per second, including 
reads and writes, and average latency per requests. 

PARSEC is a benchmark suite for studies of 
Chip-Multiprocessors (CMPs) [11]. PARSEC in- 
cludes emerging applications in recognition, min- 
ing and synthesis (RMS) as well as systems ap- 
plications which mimic large-scale multi-threaded 
commercial programs. SPEC CPU2006 provides 
a snapshot of current scientific and engineering 
applications, including a suite of serial programs 
that is not intended for studies of parallel ma- 
chines [11]. SPEC CPU2006 component suite in- 
clude both CINT2006 — the Integer Benchmarks and 
CFP2006 — the Floating Point Benchmarks. After 
the benchmarks are run on the system under test 
(SUT), a ratio for each of them is calculated using 
the run time on the system under test and a SPEC- 
determined reference time [36]. 

SPEC also proposed a series of benchmarks for 
Java applications [36]. Among them, SPECjvm2008 
is a cUent JVM benchmark; SPECjbb2005 is 
a server JVM benchmark, while SPECjEnter- 
prise2010 is a Java enterprise edition applica- 
tion server benchmark. SPECjms2007 is the first 
industry-standard benchmark for evaluating the 
performance of enterprise message-oriented mid- 
dleware servers based on IMS (Java Message 
Service). SPECweb2009 emulates users sending 
browser requests over broadband Internet connec- 
tions to a web server using both HTTP and 
HTTPS. It provides banking, e-commerce, and sup- 
port workloads, along with a new power workload 
based on the e-commerce workload [36]. SPEC- 



sip_Infrastructure2011 is designed to evaluate a 

system's ability to act as a SIP server supporting 
a particular SIP application [36]. The application is 
modeled after a VoIP deployment [36]. The metric 
is the simultaneous number of supported subscribers 
[36]. 

1) discussion: We draw four conclusions in 
this subsection: first, there is no systematic work 
on benchmarking throughput-oriented workloads in 
data centers. In Section I, we have identified three 
categories of HVC workloads. Few previous bench- 
marks pay attentions to all three categories of ap- 
plications. 

Second, some benchmarking efforts [13] [9] have 
focused on MapReduce-based data analysis appli- 
cations (belong to our second category of HVC 
workloads), and the metric is toward evaluating 
MapReduce runtime systems. They failed to notice 
that there are diverse programming models in this 
field, e.g., Dryad, AUPair [2], since MapReduce 
is not a one-size-fits-all solution [2]. For example, 
MapReduce or Dryad is not appropriate for applica- 
tions such as iterative jobs, nested parallelism, and 
irregular parallelism [2]. 

Third, a lot of previous benchmarks, e.g., TPC 
or SPEC efforts, pay attention to services. Unfor- 
tunately, emerging data-intensive services, such as 
Web search engines are ignored. 

Last, in addition to SPECsip_Infrastructure2011, 
little previous work focuses on interactive real- 
time applications, which are important emerging 
workloads, since more and more users tend to use 
streaming media or VoIP for fun or communica- 
tions. 

B. Our ongoing work on metrics and benchmarks. 

Benchmarking is the foundation of evaluating 
HVC systems. However, to be relevant, an HVC 
benchmark suite needs to satisfy a number of prop- 
erties as follows: 

First, the applications in the suite should consider 
a target class of machines [11], that is a data center 
system designed for throughput-oriented workloads, 
not a processor or server in this paper. 

Second, the HVC benchmark suite should rep- 
resent three categories of important applications, 
including services, data processing applications, and 
interactive real-time applications, which is the right 



target of our ongoing DCBenchmarks project (http:// 

prof.ncic.ac.cn/DCBenchmarks). In this project, we 
have released a SEARCH benchmark [3]. We will 
release a benchmark for shared data center systems 
running data processing applications soon. 

Third, the workloads in the HVC benchmark suite 
should be diverse enough to exhibit the range of 
behavior of the target applications [11]. Meanwhile, 
since service providers may deploy different appli- 
cations, it is important for a service provider to 
customize their chosen benchmarks relevant to their 
applications. 

Fourth, no single metric can measure the per- 
formance of computer systems on all applications 
[8]. Since different categories of workloads in HVC 
have different focuses, we propose different met- 
rics. For each category of workloads, we propose 
an aggregate metric which measures a data center 
system on the whole, and an auxiliary metric to 
evaluate energy efficiency, which can be measured 
on the level of not only a server but also a data 
center system. As shown in Table I, for services, 
we propose requests per minute as an aggregate 
metric, and requests per joule as an auxiliary metric 
evaluating energy efficiency. For data processing 
applications, we propose data processed per minute 
as an aggregate metric and data processed per 
joule as an auxiliary metric. For interactive real- 
time applications, we propose maximum number of 
simultaneous subscribers as an aggregate metrics 

and subscribers per watt a ratio of the maximum 

number of simultaneous subscribers to the power 
consumption in a unit time as an auxiliary metric. 
A subscribe here can be a user or device. Due to 
the space limitation, we will report the experiment 
results in detail in another paper. 

We also developed several innovative perfor- 
mance analysis tools [40] [41] [42], aiding with 
understanding HVC workloads. 

IV. Conclusion 

There are four-fold contributions in this paper. 
For the first time, we systematically characterized 
HVC systems and identified three categories of 
HVC workloads: services, data processing or in- 
teractive real-time applications; we compared HVC 
with other computing paradigms in terms of six 
dimensions: levels, workloads, metrics, coupling 



TABLE n: Comparison of different benchmarks and metrics. 



Benchmark 


Domains 


Level 


Workloads 


Metrics 


Unpack [7] 


High performance 
computing 


Super computers 


scientific computing code 


Float point operations per second 


swap [24] 


Enterprise 


Systems 


Undefined 


Performance/(space * watts) 


Green 500 [17] 


High performance 
computing 


Super computers 


Scientific computing code 


Flops per watt 


Graph 500 [20] 


High performance 
computing 


Super computers 


Computing in the field of 
graph theory 


Traversed edges per second 
(TEPS) 


JouleSort [18] 


Mobile, desktop, en- 
terprise 


Systems 


External sort 


Records sorted per Joule 


SPECpower 
_ssj2008 [36] 


Enterprise 


Systems 


SPECjbb2005 


Ssj_ops/watt 


[22] 


Storage I/O 


Storage systems 


Transaction processing or 
scientific applications 


I/O or data rates 


SPECsfs2008 


Network file systms 


File servers 


N/a 


Operations per second and over- 
all latency of the operations 


HiBench [13] 


Data-intensive 
scalable compuling 


MapRcducc run- 
time systems 


Data analysis 


Job running time and number of 
tasks completed per minute 


GridMix2 or Grid- 
Mix3 1391 


Data-intensive 
sralablp rnmniifiTip 


MapReduce run- 
time systems 


Data analysis 


Number of completed jobs and 
running time 


WL Suite [9] 


Data-intensive 

scalable computing 


MapReduce run- 
time systems 


Data Analysis 


n/a 


YCSB ]19] or 
YCSB++ [10] 


Warehouse-scale 
computing 


NoSQL systems 


Scale-out data services 


Total operations per second and 
average latency per requests 


PARSEC [11 ] 


n/a 


Chip- 
Multiprocessors 


Recognition, mining, syn- 
thesis, and numic large- 
scale multithreaded com- 
mercial programs 


n/a 


TPC C/E/H [23] 


Throughput-oriented 
workloads 


Server systems 


Transaction processing and 
decision support 


Application-specific 


SPEC CPU2006 
[36[ 


scientific and engi- 
neering applications 


Processors 


Serial programs 


A ratio is calculated using the 
run time on the system under 
test and a SPEC-determined ref- 
erence time 


SPECjvm2008, 

SPECjhb2005, 

SPECjEnter- 

prise2010, 

SPECjms2007 

[36[ 


Throughput-oriented 
workloads 


Both hardware 
and software 


Java applications 


Throughput (application- 
specific) 


SPECsip 

lnfrastructure201 1 
[36] 


Throughput-oriented 
workloads 


Systems 


A VoIP deployment 


Simultaneous number of sup- 
ported subscribers 


SPECvirt sc2010 
[36[ 


Throughput-oriented 
workloads 


Systems 


SPECweb2005, 
SPECjAppServer2004, 
and SPECmail2008 


Performance only and perfor- 
mance per watt 


SPECweb2009 [36] 


Throughput-oriented 
workloads 


Systems 


Banking, Ecommerce, and 
Support 


Maximum number of simultane- 
ous user sessions, ratio of the 
sum of simultaneous user ses- 
sions to the sum of watts used 



degree, data scales, and number of jobs or service 
instances; we widely investigated previous bench- 
marks, and found there is no systematic work on 
benchmarking HVC systems; we presented our pre- 
liminary work on HVC metrics and benchmarks. 
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